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Phe-74toPro-81. 


Gly-1 to Leu-6, 
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Gly-43 to Lys-48, 
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Ser-15 to Glu-23, 
Val-33 to Gly-40, 
Arg-47 to Pro-59, 
Ser-64 to Phe-70, 
Gly-92 to Lys-97, 
Lys-155 to Glu-166. 
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Ser-20 to Ser-27. 
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Ser-48 to Leu-56, 
Ser-70 to Val-75, 
Thr-83toAla-92. 
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Asn-80 to Gly-86, 
Glu-133toArg-138. 


Leu-ltoThr-7, 
Pro-39 to Ser-48, 
Trp-62toThr-68. 


Pro-28toTyr-37, 
Leu-52toTrp-57, 
Lys-109toLeu-114. 
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Lys-28 to Tyr-49, 
Arg-75 to Ala-83, 
Thr-90toAla-95, 
Arg-107toAsp-114, 
Arg-137 to Ser-146. 




Arg-1 to Gly-6, 
Lys-21 to Gly-29, 
Gln-51 toGly-56, 
Pro-82 to Asp-90, 
Lys-175 to Ser-180. 
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Pro-14 to Gly-22, 
Ile-31 to Ala-36, 
Lys-53 to Ile-65. 


Val-2 to Leu-20. 
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Pro-1 to Lys-10, 
Phe-31 toAsn-37, 
Gln-42 to Leu-50, 
Arg-58 to Gln-65, 
Leu-86toAla-91, 
Tyr-101 to Ala-Ill. 
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Gly-35 to Glu-41. 




Gly-13 to Lys-20, 
Ala-26 to Lys-32, 
Ala-83 to Leu-88, 
Gly-128 to Val-135, 
Phe-142toAsn-151. 
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Arg-8 to Gln-18, 
Thr-24 to Gly-35, 
Lys-53 to Gly-61, 
Pro-77 to Cys-82, 
Pro-103toGln-117. 




Leu- 14 to Arg-20, 
Ser-29 to Ser-38, 
Pro-43 to Gly-52. 


Ile-34toAla-40. 


Arg-70 to Ser-78. 
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Lys-36 to Gln-48, 
Gln-56 to Ser-72, 
Gly-82toVal-105, 
Lys-114toLys-120, 
Ser-122 to Cys-133. 


Ala-1 to Ser-6. 
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Ser-27 to Gly-38, 
Ser-63toTrp-72, 
Trp-79 to Asn-84, 
Arg-133toTrp-139. 
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Glu-1 to Ser-9, 
Pro-60 to Gln-66. 
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Phe-164 to Leu-170. 


Gln-30 to Gly-36, 
Lys-43 to Lys-54, 
Thr-73 to Ile-79. 




Lys-17 to Leu-27, 
Thr-39 to Leu-44, 
Asp-62 to Gly-70, 
Arg-89 to Asp-94, 
Arg-102toGly-113, 
Lys-127 to Glu-132, 
Thr-152toArg-160. 




Gly-8 to Ala-15, 
Leu-33 to His-50, 
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Phe-59 to Gly-64, 
Gly-70 to Gly-84. 


Thr-20toAsp-27, 
His-39toThr-44. 
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Asp-36 to Glu-46, 
Gln-52toIle-61, 
Leu-76 to Lys-87, 
Asn-100 to Val-108, 
Ser-120 to Gln-128. 
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Pro-7 to Gly-20. 
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Pro-38 to Gln-43. 
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Leu-9 to Val-14, 
Glu-41 to Ala-49. 
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Gly-18toAsp-23, 
GIn-34toGly-40, 
Asp-81 toLys-87. 




Asp-43 to Gly-48. 
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Glu-16 to Lys-21, 
Asn-62 to Arg-68, 
Asp-94toThr-102, 
Gly-161 to Leu-170. 




Ser-9 to Glu-16, 

Phe-49toTyr-58, 

Asn-l02toGly-llO. 
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Asp-24toLeu-31, 
Ile-123 to Gln-128. 
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Met-1 to Leu-6, 
Gly-19 to Lys-26, 
Ala-32 to Lys-38, 
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Lys-36 to Gly-43, 
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Ala-1 to Gly-8, 
Leu-26 to Ser-40, 
Pro-57 to Leu-95. 
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Leu-16 to Pro-22, 
Val-30 to Asp-36. 
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[0040] The first column in Table 1 provides a unique "Clone ID NO:Z" for a cDNA 

clone related to each contig sequence disclosed in Table 1. This clone ID references the 
cDNA clone which contains at least the 5 5 most sequence of the assembled contig, and at 
least a portion of SEQ ID NO:X was determined by directly sequencing the referenced 
clone. The reference clone may have more sequence than described in the sequence listing 
or the clone may have less. In the vast majority of cases, however, the clone is believed to 
encode a full-length polypeptide. In the case where a clone is not full-length, a full-length 
cDNA can be obtained by methods known in the art and/or as described elsewhere herein. 
[0041] The second column in Table 1 provides a unique "Contig ID" identification 

for each contig sequence. The third column provides the "SEQ ID NO:X" identifier for 
each of the ovarian associated contig polynucleotide sequences disclosed in Table 1. The 
fourth column, "ORF (From-To)", provides the location (i.e., nucleotide position numbers) 
within the polynucleotide sequence "SEQ ID NO:X" that delineate the preferred open 
reading frame (ORF) shown in the sequence listing and referenced in Table 1, column 5, as 
SEQ ID NO:Y. Where the nucleotide position number "To" is lower than the nucleotide 
position number "From", the preferred ORF is the reverse complement of the referenced 
polynucleotide sequence. 

[0042] The fifth column in Table 1 provides the corresponding SEQ ID NO:Y for 

the polypeptide sequence encoded by the preferred ORF delineated in column 4. In one 
embodiment, the invention provides an amino acid sequence comprising, or alternatively 
consisting of, a polypeptide encoded by the portion of SEQ ID NO:X delineated by "ORF 
(From-To)". Also provided are polynucleotides encoding such amino acid sequences and 
the complementary strand thereto. 

[0043] Column 6 in Table 1 lists residues comprising epitopes contained in the 

polypeptides encoded by the preferred ORF (SEQ ID NO:Y), as predicted using the 
algorithm of Jameson and Wolf, (1988) Comp. Appl. Biosci. 4:181-186. The Jameson- 
Wolf antigenic analysis was performed using the computer program PROTEAN (Version 
3.11 for the Power Macintosh, DNASTAR, Inc., 1228 South Park Street Madison, WI). In 
specific embodiments, polypeptides of the invention comprise, or alternatively consist of, at 
least one, two, three, four, five or more of the predicted epitopes as described in Table 1. It 
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will be appreciated that depending on the analytical criteria used to predict antigenic 
determinants, the exact address of the determinant may vary slightly. 
[0044] Column 7 in Table 1 provides an expression profile and library code: count 

for each of the contig sequences (SEQ ID NO:X) disclosed in Table 1, which can routinely 
be combined with the information provided in Table 4 and used to determine the normal or 
diseased tissues, cells, and/or cell line libraries which predominantly express the 
polynucleotides of the invention. The first number in column 7 (preceding the colon), 
represents the tissue/cell source identifier code corresponding to the code and description 
provided in Table 4. For those identifier codes in which the first two letters are not "AR", 
the second number in column 7 (following the colon) represents the number of times a 
sequence corresponding to the reference polynucleotide sequence was identified in the 
tissue/cell source. Those tissue/cell source identifier codes in which the first two letters are 
"AR" designate information generated using DNA array technology. Utilizing this 
technology, cDNAs were amplified by PCR and then transferred, in duplicate, onto the 
array. Gene expression was assayed through hybridization of first strand cDNA probes to 
the DNA array. cDNA probes were generated from total RNA extracted from a variety of 
different tissues and cell lines. Probe synthesis was performed in the presence of 33 P dCTP, 
using oligo(dT) to prime reverse transcription. After hybridization, high stringency washing 
conditions were employed to remove non-specific hybrids from the array. The remaining 
signal, emanating from each gene target, was measured using a Phosphorimager. Gene 
expression was reported as Phosphor Stimulating Luminescence (PSL) which reflects the 
level of phosphor signal generated from the probe hybridized to each of the gene targets 
represented on the array. A local background signal subtraction was performed before the 
total signal generated from each array was used to normalize gene expression between the 
different hybridizations. The value presented after "[array code]:" represents the mean of 
the duplicate values, following background subtraction and probe normalization. One of 
skill in the art could routinely use this information to identify normal and/or diseased 
tissue(s) which show a predominant expression pattern of the corresponding polynucleotide 
of the invention or to identify polynucleotides which show predominant and/or specific 
tissue and/or cell expression. The sequences disclosed herein have been determined to be 



1481 



WO 02/00677 



PCT/US01/18569 



predominantly expressed in ovarian tissues, including normal and diseased ovarian tissues 
(See Table 1, column 7 and Table 4). 

[0045] Column 8 in Table 1 provides a chromosomal map location for certain 

polynucleotides of the invention. Chromosomal location was determined by finding exact 
matches to EST and cDNA sequences contained in the NCBI (National Center for 
Biotechnology Information) UniGene database. Each sequence in the UniGene database is 
assigned to a "cluster"; all of the ESTs, cDNAs, and STSs in a cluster are believed to be 
derived from a single gene. Chromosomal mapping data is often available for one or more 
sequence(s) in a UniGene cluster; this data (if consistent) is then applied to the cluster as a 
whole. Thus, it is possible to infer the chromosomal location of a new polynucleotide 
sequence by determining its identity with a mapped UniGene cluster. 
[0046] A modified version of the computer program BLASTN (Altshul et al, J. 

Mol. Biol 215:403-410 (1990), and Gish et al, Nat. Genet. 3:266-272 (1993)) was used to 
search the UniGene database for EST or cDNA sequences that contain exact or near-exact 
matches to a polynucleotide sequence of the invention (the 'Query'). A sequence from the 
UniGene database (the 'Subject') was said to be an exact match if it contained a segment of 
50 nucleotides in length such that 48 of those nucleotides were in the same order as found 
in the Query sequence. If all of the matches that met this criteria were in the same UniGene 
cluster, and mapping data was available for this cluster, it is indicated in Table 1 under the 
heading "Cytologic Band". Where a cluster had been further localized to a distinct cytologic 
band, that band is disclosed; where no banding information was available, but the gene had 
been localized to a single chromosome, the chromosome is disclosed. 
[0047] Once a presumptive chromosomal location was determined for a 

polynucleotide of the invention, an associated disease locus was identified by comparison 
with a database of diseases which have been experimentally associated with genetic loci. 
The database used was the Morbid Map, derived from OMIM™ {supra). If the putative 
chromosomal location of a polynucleotide of the invention (Query sequence) was 
associated with a disease in the Morbid Map database, an OMIM reference identification 
number was noted in column 9, Table 1, labeled "OMIM Disease Reference(s)". Table 5 is 
a key to the OMIM reference identification numbers (column 1), and provides a description 
of the associated disease in Column 2. 
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proteolipid protein 2 
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[0048] Table 2 further characterizes certain encoded polypeptides of the 

invention, by providing the results of comparisons to protein and protein family databases. 
The first column provides a unique clone identifier, "Clone ID NO:", corresponding to a 
cDNA clone disclosed in Table 1. The second column provides the unique contig 
indentifier, "Contig ID:" which allows correlation with the information in Table 1. The 
third column provides the sequence identifier, "SEQ ID NO:X", for the contig 
polynucleotide sequences. The fourth column provides the analysis method by which the 
homology/identity disclosed in the row was determined. The fifth column provides a 
description of PFam/NR hits having significant matches identified by each analysis. 
Column six provides the accession number of the PFam/NR hit disclosed in the fifth 
column. Column seven, "Score/Percent Identity", provides a quality score or the percent 
identity, of the hit disclosed in column five. Comparisons were made between 
polypeptides encoded by polynucleotides of the invention and a non-redundant protein 
database (herein referred to as "NR"), or a database of protein families (herein referred to 
as "PFam"), as described below. 

[0049] The NR database, which comprises the NBRF PIR database, the NCBI 

GenPept database, and the SIB SwissProt and TrEMBL databases, was made non- 
redundant using the computer program nrdb2 (Warren Gish, Washington University in 
Saint Louis). Each of the polynucleotides shown in Table 1, column 3 (e.g., SEQ ID 
NO:X or the 'Query' sequence) was used to search against the NR database. The computer 
program BLASTX was used to compare a 6-frame translation of the Query sequence to 
the NR database (for information about the BLASTX algorithm please see Altshul et al., J. 
Mol. Biol 215:403-410 (1990), and Gish et al., Nat. Genet. 3:266-272 (1993)). A 
description of the sequence that is most similar to the Query sequence (the highest scoring 
'Subject 5 ) is shown in column five of Table 2 and the database accession number for that 
sequence is provided in column six. The highest scoring 'Subject' is reported in Table 2 if 
(a) the estimated probability that the match occurred by chance alone is less than 1.0e-07, 
and (b) the match was not to a known repetitive element. BLASTX returns alignments of 
short polypeptide segments of the Query and Subject sequences which share a high degree 
of similarity; these segments are known as High-Scoring Segment Pairs or HSPs. Table 2 
reports the degree of similarity between the Query and the Subject for each HSP as a 
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percent identity in Column 7. The percent identity is determined by dividing the number 
of exact matches between the two aligned sequences in the HSP, dividing by the number 
of Query amino acids in the HSP and multiplying by 100. The polynucleotides of SEQ ID 
NO:X which encode the polypeptide sequence that generates an HSP are delineated by 
columns 8 and 9 of Table 2. 

[0050] The PFam database, PFam version 5.2, (Sonnhammer et al., Nucl. Acids 

Res., 26:320-322, (1998)) consists of a series of multiple sequence alignments; one 
alignment for each protein family. Each multiple sequence alignment is converted into a 
probability model called a Hidden Markov Model, or HMM, that represents the position- 
specific variation among the sequences that make up the multiple sequence alignment 
(see, e.g., R. Durbin et al., Biological sequence analysis: probabilistic models of proteins 
and nucleic acids, Cambridge University Press, 1998 for the theory of HMMs). The 
program HMMER version 1.8 (Sean Eddy, Washington University in Saint Louis) was 
used to compare the predicted protein sequence for each Query sequence (SEQ ID NO:Y 
in Table 1) to each of the HMMs derived from PFam version 5.2. A HMM derived from 
PFam version 5.2 was said to be a significant match to a polypeptide of the invention if 
the score returned by HMMER 1.8 was greater than 0.8 times the HMMER 1.8 score 
obtained with the most distantly related known member of that protein family. The 
description of the PFam family which shares a significant match with a polypeptide of the 
invention is listed in column 5 of Table 2, and the database accession number of the PFam 
hit is provided in column 6. Column 7 provides the score returned by HMMER version 
1.8 for the alignment. Columns 8 and 9 delineate the polynucleotides of SEQ ID NO:X 
which encode the polypeptide sequence which shows a significant match to a PFam 
protein family. 

[0051] As mentioned, columns 8 and 9 in Table 2, "NT From" and "NT To", 

delineate the polynucleotides of "SEQ ID NO:X" that encode a polypeptide having a 
significant match to the PFam/NR database as disclosed in the fifth column of Table 2. In 
one embodiment, the invention provides a protein comprising, or alternatively consisting 
of, a polypeptide encoded by the polynucleotides of SEQ ID NO:X delineated in columns 
8 and 9 of Table 2. Also provided are polynucleotides encoding such proteins, and the 
complementary strand thereto. 

1717 



WO 02/00677 



PCT/US01/18569 



[0052] The nucleotide sequence SEQ ID NO:X and the translated SEQ ID NO:Y 

are sufficiently accurate and otherwise suitable for a variety of uses well known in the art 
and described further below. For instance, the nucleotide sequences of SEQ ID NO:X are 
useful for designing nucleic acid hybridization probes that, will detect nucleic acid 
sequences contained in SEQ ID NO:X or the cDNA contained in Clone ID NO:Z. These 
probes will also hybridize to nucleic acid molecules" in biological samples, thereby 
enabling immediate applications in chromosome mapping, linkage analysis, tissue 
identification and/or typing, and a variety of forensic and diagnostic methods of the 
invention. Similarly, polypeptides identified from SEQ ID NO:Y may be used to generate 
antibodies which bind specifically to these polypeptides, or fragments thereof, and/or to 
the polypeptides encoded by the cDNA clones identified in, for example, Table 1. 
[0053] Nevertheless, DNA sequences generated by sequencing reactions can 

contain sequencing errors. The errors exist as misidentified nucleotides, or as insertions or 
deletions of nucleotides in the generated DNA sequence. The erroneously inserted or 
deleted nucleotides cause frame shifts in the reading frames of the predicted amino acid 
sequence. . In these cases, the predicted amino acid sequence diverges from the actual 
amino acid sequence, even though the generated DNA sequence may be greater than 
99.9% identical to the actual DNA sequence (for example, one base insertion or deletion 
in an open reading frame of over 1000 bases). 

[0054] Accordingly, for those applications requiring precision in the nucleotide 

sequence or the amino acid sequence, the present invention provides not only the 
generated nucleotide sequence identified as SEQ ID NO:X, and a predicted translated 
amino acid sequence identified as SEQ ID NO:Y, but also a sample of plasmid DNA 
containing cDNA Clone ID NO:Z (deposited with the ATCC on June 5, 2000 and were 
given ATCC Deposit Nos. PTA-1982 and PTA4985; and/or as set forth, for example, in 
Table 1, 6 and 7). The nucleotide sequence of each deposited clone can readily be 
determined by sequencing the deposited clone in accordance with known methods. 
Further, techniques known in the art can be used to verify the nucleotide sequences of 
SEQ ID NO:X.niques known in the art can be used to verify the nucleotide sequences of 
SEQIDNO:X. 
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[0055] The predicted amino acid sequence can then be verified from such deposits. 

Moreover, the amino acid sequence of the protein encoded by a particular clone can also 
be directly determined by peptide sequencing or by expressing the protein in a suitable 
host cell containing the deposited human cDNA, collecting the protein, and determining 
its sequence. 

RACE Protocol For Recovery of Full-Length Genes 

[0056] Partial cDNA clones can be made full-length by utilizing the rapid 

amplification of cDNA ends (RACE) procedure described in Frohman, M.A., et al., Proc. 
Nat'l. Acad. Sci. USA, 85:8998-9002 (1988). A cDNA clone missing either the 5» or 3' 
end can be reconstructed to include the absent base pairs extending to the translational 
start or stop codon, respectively. In some cases, cDNAs are missing the start codon of 
translation. The following briefly describes a modification of this original 5' RACE 
procedure. Poly A+ or total RNA is reverse transcribed with Superscript II (Gibco/BRL) 
and an antisense or complementary primer specific to the cDNA sequence. The primer is 
removed from the reaction with a Microcon Concentrator (Amicon). The first-strand 
cDNA is then tailed with dATP and terminal deoxynucleotide transferase (Gibco/BRL). 
Thus, an anchor sequence is produced which is needed for PCR amplification. The 
second strand is synthesized from the dA-tail in PCR buffer, Taq DNA polymerase (Per- 
kin-Elmer Cetus), an oligo-dT primer containing three adjacent restriction sites (Xhol, 
Sail and Clal) at the 5' end and a primer containing just these restriction sites. This 
double-stranded cDNA is PCR amplified for 40 cycles with the same primers as well as a 
nested cDNA-specific antisense primer. The PCR products are size-separated on an 
ethidium bromide-agarose gel and the region of gel containing cDNA products the 
predicted size of missing protein-coding DNA is removed. cDNA is purified from the - 
agarose with the Magic PCR Prep kit (Promega), restriction digested with Xhol or Sail, 
and ligated to a plasmid such as pBluescript SKU (Stratagene) at Xhol and EcoRV sites. 
This DNA is transformed into bacteria and the plasmid clones sequenced to identify the 
correct protein-coding inserts. Correct 5 f ends are confirmed by comparing this sequence 
with the putatively identified homologue and overlap with the partial cDNA clone. Similar 
methods known in the art and/or commercial kits are used to amplify and recover 3' ends. 
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10057] Several quality-controlled kits are commercially available for purchase. 

Similar reagents and methods to those above are supplied in kit form from Gibco/BRL for 
both 5* and 3' RACE for recovery of full length genes. A second kit is available from 
Clontech which is a modification of a related technique, SLIC (single-stranded ligation to 
single-stranded cDNA), developed by Dumas et al., Nucleic Acids Res., 19:5227-32 
(1991). The major differences in procedure are that the RNA is alkaline hydrolyzed after - 
reverse transcription and RNA ligase is used to join a restriction site-containing anchor 
primer to the first-strand cDNA. This obviates the necessity for the dA-tailing reaction 
which results in a polyT stretch that is difficult to sequence past. 

[0058] An alternative to generating 5' or 3 f cDNA from RNA is to use cDNA 

library double-stranded DNA. An asymmetric PCR-amplified antisense cDNA strand is 
synthesized with an antisense cDNA-specific primer and a plasmid-anchored primer. 
These primers are removed and a symmetric PCR reaction is performed with a nested 
cDNA-specific antisense primer and the plasmid-anchored primer. 

RNA Ligase Protocol For Generating The 5' or 3' End Sequences To Obtain Full Length 
Genes 

[0059] Once a gene of interest is identified, several methods are available for the 

identification of the 5 f or 3 ! portions of the gene which may not be present in the original 
cDNA plasmid. These methods include, but are not limited to, filter probing, clone 
enrichment using specific probes and protocols similar and identical to 5' and 3' RACE. 
While the full length gene may be present in the library and can be identified by probing, a 
useftd method for generating the 5' or 3 1 end is to use the existing sequence information 
from the original cDNA to generate the missing information. A method similar to 5' 
RACE is available for generating the missing 5' end of a desired full-length gene. (This 
method was published by Fromont-Racine et al., Nucleic Acids Res., 21(7):1683-1684 
(1993)). Briefly, a specific RNA oligonucleotide is ligated to the 5' ends of a population 
of RNA presumably containing full-length gene RNA transcript. A primer set containing 
a primer specific to the ligated RNA oligonucleotide and a primer specific to a known 
sequence of the gene of interest, is used to PCR amplify the 5' portion of the desired full 
length gene which may then be sequenced and used to generate the full length gene. This 
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method starts with total RNA isolated from the desired source, poly A RNA may be used 
but is not a prerequisite for this procedure. The RNA preparation may then be treated with 
phosphatase if necessary to eliminate 5' phosphate groups on degraded or damaged RNA 
which may interfere with the later RNA ligase step. The phosphatase if used is then 
inactivated and the RNA is treated with tobacco acid pyrophosphatase in order to remove 
the cap structure present at the 5' ends of messenger RNAs. This reaction leaves a 5' 
phosphate group at the 5' end of the cap cleaved RNA which can then be ligated to an 
RNA oligonucleotide using T4 RNA ligase. This modified RNA preparation can then be 
used as a template for first strand cDNA synthesis using a gene specific oligonucleotide. 
The first strand synthesis reaction can then be used as a template for PCR amplification of 
the desired 5' end using a primer specific to the ligated RNA oligonucleotide and a primer 
specific to the known sequence of the ovarian antigen of interest. The resultant product is 
then sequenced and analyzed to confirm that the 5' end sequence belongs to the relevant 
ovarian antigen. 

[0060] The present invention also relates to vectors or plasmids, which include 

such DNA sequences, as well as the use of the DNA sequences. The material deposited 
with the ATCC (deposited with the ATCC on June 5, 2000 and were given ATCC Deposit 
Nos. PTA-1982 and PTA-1985; and/or as set forth, for example, in Table 1, 6 and 7) is a 
mixture of cDNA clones derived from a variety of human tissue and cloned in either a 
plasmid vector or a phage vector, as shown, for example, in Table 7. These deposits are 
referred to as "the deposits" herein. The tissues from which some of the clones were 
derived are listed in Table 7, and the vector in which the corresponding cDNA is 
contained is also indicated in Table 7. The deposited material includes cDNA clones 
corresponding to SEQ ID NO:X described, for example, in Table 1 (Clone ID NO:Z). A 
clone which is isolatable from the ATCC Deposits by use of a sequence listed as SEQ ID 
NO:X, may include the entire coding region of a human gene or in other cases such clone 
may include a substantial portion of the coding region of a human gene. Furthermore, 
although the sequence listing may in some instances list only a portion of the DNA 
sequence in a clone included in the ATCC Deposits, it is well within the ability of one 
skilled in the art to sequence the DNA included in a clone contained in the ATCC 
Deposits by use of a sequence (or portion thereof) described in, for example Tables 1A or 
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2 by procedures hereinafter further described, and others apparent to those skilled in the 
art. 

[0061] Also provided in Table 7 is the name of the vector which contains the 

cDNA clone. Each vector is routinely used in the art. The following additional 
information is provided for convenience. 

[0062] Vectors Lambda Zap (U.S. Patent Nos. 5,128,256 and 5,286,636), Uni-Zap 

XR (U.S. Patent Nos. 5,128,256 and 5,286,636), Zap Express (U.S. Patent Nos. 5,128,256 
and 5,286,636), pBlUescript (pBS) (Short, J. M. et al., Nucleic Acids Res. 76:7583-7600 
(1988); Alting-Mees, M. A. and Short, J. M., Nucleic Acids Res. 77:9494 (1989)) and 
pBK (Alting-Mees, M. A. et al., Strategies 5:58-61 (1992)) are commercially available 
from Stratagene Cloning Systems, Inc., 11011 N. Torrey Pines Road, La Jolla, CA, 92037. 
pBS contains an ampicillin resistance gene and pBK contains a neomycin resistance gene. 
Phagemid pBS may be excised from the Lambda Zap and Uni-Zap XR vectors, and 
phagemid pBK may be excised from the Zap Express vector. Both phagemids may be 
transformed into E. coli strain XL-1 Blue, also available from Stratagene. 
[0063] Vectors pSportl, pCMVSport 1.0, pCMVSport 2.0 and pCMVSport 3.0, 

were obtained from Life Technologies, Inc., P. O. Box 6009, Gaithersburg, MD 20897. 
All Sport vectors contain an ampicillin resistance gene and may be transformed into E. 
coli strain DH10B, also available from Life Technologies. See, for instance, Gruber, C. 
R, et al., Focus 75:59- (1993). Vector lafinid BA (Bento Soares, Columbia University, 
New York, NY) contains an ampicillin resistance gene and can be transformed into E. coli 
strain XL-1 Blue. Vector pCR®2.1, which is available from Invitrogen, 1600 Faraday 
Avenue, Carlsbad, CA 92008, contains an ampicillin resistance gene and may be 
transformed into £. coli strain DH10B, available from Life Technologies. See, for 
instance, Clark, J. M., Nuc. Acids Res. 16:9677-9686 (1988) and Mead, D. et al., 
Bio/Technology 9: (1991). 

[0064] The present invention also relates to the genes corresponding to SEQ ID 

NO:X, SEQ ID NO:Y, and/or the deposited clone (Clone ID NO:Z). The corresponding 
gene can be isolated in accordance with known methods using the sequence information 
disclosed herein. Such methods include preparing probes or primers from the disclosed 
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sequence and identifying or amplifying the corresponding gene from appropriate sources 
of genomic material. 

[0065] Also provided in the present invention are allelic variants, orthologs, and/or 

species homology Procedures known in the art can be used to obtain full-length genes, 
allelic variants, splice variants, full-length coding portions, orthologs, and/or species 
homologs of ovarian associated genes corresponding to SEQ ID NO:X or the complement 
thereof, polypeptides encoded by SEQ ID NO:X or the complement thereof, and/or the 
cDNA contained in Clone ID NO:Z, using information from the sequences disclosed 
herein or the clones deposited with the ATCC. For example, allelic variants and/or species 
homologs may be isolated and identified by making suitable probes or primers from the 
sequences provided herein and screening a suitable nucleic acid source for allelic variants 
and/or the desired homologue. 

[0066] The polypeptides of the invention can be prepared in any suitable manner. 

Such polypeptides include isolated naturally occurring polypeptides, recombinantly 
produced polypeptides, synthetically produced polypeptides, or polypeptides produced by 
a combination of these methods. Means for preparing such polypeptides are well 
understood in the art. 

[0067] The polypeptides may be in the form of the secreted protein, including the 

mature form, or may be a part of a larger protein, such as a fusion protein (see below). It 
is often advantageous to include an additional amino acid sequence which contains 
secretory or leader sequences, pro-sequences, sequences which aid in purification, such as 
multiple histidine residues, or an additional sequence for stability during recombinant 
production. 

[0068] The polypeptides of the present invention are preferably provided in an 

isolated form, and preferably are substantially purified. A recombinantly produced 
version of a polypeptide, including the secreted polypeptide, can be substantially purified 
using techniques described herein or otherwise known in the art, such as, for example, by 
the one-step method described in Smith and Johnson, Gene 67:31-40 (1988). 
Polypeptides of the invention also can be purified from natural, synthetic or recombinant 
sources using techniques described herein or otherwise known in the art, such as, for 
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example, antibodies of the invention raised against the ovarian polypeptides of the present 
invention in methods which are well known in the art. 

[0069] The present invention provides a polynucleotide comprising, or 

alternatively consisting of, the nucleic acid sequence of SEQ ID NO:X, and/or the cDNA 
sequence contained in Clone ID NO:Z. The present invention also provides a polypeptide 
comprising, or alternatively, consisting of, the polypeptide sequence of SEQ ID NO:Y, a 
polypeptide encoded by SEQ ID NO:X or a complement thereof, and/or a polypeptide 
encoded by the cDNA contained in Clone ID NO:Z. Polynucleotides encoding a 
polypeptide comprising, or alternatively consisting of the polypeptide sequence of SEQ ID 
NO:Y, a polypeptide encoded by SEQ ID NO:X, and/or a polypeptide encoded by the 
cDNA contained in Clone ID NO:Z are also encompassed by the invention. The present 
invention further encompasses a polynucleotide comprising, or alternatively consisting of, 
the complement of the nucleic acid sequence of SEQ ID NO:X, a nucleic acid sequence 
encoding a polypeptide encoded by the complement of the nucleic acid sequence of SEQ 
ID NO:X, and/or the cDNA contained in Clone ID NO:Z. 

[0070] Many polynucleotide sequences, such as EST sequences, are publicly 

available and accessible through sequence databases and may have been publicly available 
prior to conception of the present invention. Preferably, such related polynucleotides are 
specifically excluded from the scope of the present invention. Accordingly, for each 
contig sequence (SEQ ID NO:X) listed in the third column of Table 1, preferably excluded 
are one or more polynucleotides comprising a nucleotide sequence described by the 
general formula of a-b, where a is any integer between 1 and the final nucleotide minus 
15 of SEQ ID NO:X, b is an integer of 15 to the final nucleotide of SEQ ID NO:X, where 
both a and b correspond to the positions of nucleotide residues shown in SEQ ID NO:X, 
and where b is greater than or equal to a + 14. More specifically, preferably excluded are 
one or more polynucleotides comprising a nucleotide sequence described by the general 
formula of a-b, where a and b are integers as defined in columns 4 and 5, respectively, of 
Table 3. In specific embodiments, the polynucleotides of the invention do not consist of at 
least one, two, three, four, five, ten, or more of the specific polynucleotide sequences 
referenced by the Genbank Accession No. as disclosed in column 6 of Table 3. In further 
embodiments, preferably excluded from the invention are the specific polynucleotide 
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sequence(s) contained in the clones corresponding to at least one, two, three, four, five, 
ten, or more of the available material having the accession numbers identified in the sixth 
column of this Table. In no way is this listing meant to encompass all of the sequences 
which may be excluded by the general formula, it is just a representative example. All 
references available through these accessions are hereby incorporated by reference in their 
entirety. 
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